Nils Desle's VoiceBand

What is it?

VoiceBand screenshot

From top to bottom:

The time data (oscilloscope) of the incoming sound

The short time frequency analysis of the incoming sound

A piano keyboard, with the red line being the current fundamental frequency of the incoming sound. The exact frequency is printed just below the keyboard

A visual profiling tool for the multithreaded job system. The green block is how much time is spent on detecting the pitch, the red blocks are the pitch shifting jobs. The white jobs are the other jobs (e.g. the last white block on the first row is the output job). There is a row for each available core, and the screenshot is taken on a dual core machine, so there are two rows in total

VoiceBand is my general purpose audio processing research application, which is still very much under construction

Currently, it can do the following things, in realtime with low latency:

Record audio from a microphone

Playback wave files

Add Echo

Perform a Low-pass filter (and high pass and band pass by extension)

Detect pitch of a single voice (singing or monophonic instruments), using this method

Change pitch without changing duration (using this free phase vocoder, improvements are planned in the future)

Take input from a midi keyboard

Mix multiple audio blocks together

Compress clipping audio

Playback processed audio

Save processed audio to a wave file

Since these are literally all just building blocks (nodes) in the software architecture, they can be combined into a processing graph to create interesting programs. Because of I use a graph system with nodes that are dependent on previous node, nodes that can be executed at the same time will be scheduled on multiple processors where available, allowing for more efficient use of multiprocessor machines, automatically.

Currently, the graphs need to be created programmatically, but in the future, I intend to create a UI that lets you drag and connect nodes to quickly setup and modify various processing graphs.

Example (pitch corrected singing):
Example1
With this graph, the input will be pitch-corrected to the nearest half-tone on the well-tempered scale. Since each node is dependent on the completion of one or more previous nodes, no parallelisation takes place.

This is what the current version sounds like (singing recorded on non-professional equipment, so excuse the noise):
Original: Hallelujah.mp3
With added background vocals: output1.mp3

It's not perfect, but as a proof on concept, it will do for now

Example (midi keyboard controlled background voices added):
Example2
With this graph, voices will be added to your own singing, based on which keys are pressed on a midi keyboard
Because of the dependencies and structure, the three pitch changing nodes (easily the most CPU intensive part of this graph) will be executed on 3 cores (if available) at the same time.

This is what the current version sounds like (singing recorded on non-professional equipment, so excuse the noise):
Original: Hallelujah.mp3
With added background vocals: output_Prototype2.mp3

Again, it's not perfect, but it will do as a starting point for further research

Future plans

Things I'd like to do in the future with this framework:

Use a time-based pitch changing algorithm (PSOLA or similar) instead of the phase vocoder I use now

Do proper formant correction/restoration when changing pitch

Work on making the backing vocals sound more human (introduce slight pitch wavering, volume changes, time-offset changes)

Work on making the backing vocals sound different (modify the formants and other characteristics so they sound like different voices)

Add a node to add automatic vibrato to the input

Add a UI so I can drag and visually connect nodes more easily

Introduce parallelism within CPU-heavy nodes so more efficient use of multicore machines is possible

...